Statistics of motifs

نویسنده

  • Sophie SCHBATH
چکیده

In this lecture we will essentially focus on the statistical analysis of the number of overlapping occurrences (count) of a given oligonucleotide (word), or a given degenerated oligonucleotide (motif or word family), in a DNA sequence. Of course, there is no restriction to sequences on a 4 letter alphabet. Related topics will be just mentioned at the end, with appropriate references. Moreover, note that this lecture is part of a more complete presentation published in the book DNA, Words and Models (Robin et al., 2003, 2005) that contains much more references. The question we would like to address is ”does this word occur in this sequence with an expected frequency?” In other words, can we observe it so many times, or so few times, just by chance? Usually, when the answer is no, such word is candidate to get a particular biological meaning; only a candidate: statistical significance is not equivalent to biological significance. As a guiding example, we will look at the occurrences of the octamer gctggtgg in the complete genome of Escherichia coli (leading strands). This word is known as the Chi motif of the bacterium; it is very frequent, with 762 occurrences on the leading strands and it is necessary for the stability of the chromosome. Let us do the following simple calculation: ”if all the 4 octamers would have the same occurrence probability in a sequence of length 4638858, then one expects to see each of them 4638851/4 ≃ 70 times in the sequence. At this point, the Chi motif seems very over-represented in E. coli because we compare 762 occurrences with 70 occurrences. The key idea is indeed to compare the observed count with the one we could expect given some knowledge on the sequence. To decide if a word count is expected or not, we need to know what to expect. This will be defined by a probabilistic model, i.e. by the description of what is “random”. After choosing the appropriate model (Section 2),

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Approximate Motif Statistics

We present in this article a fast approximate method for computing the statistics of a number of non-self-overlapping matches of motifs in a random text in the nonuniform Bernoulli model. This method is well suited for protein motifs where the probability of self-overlap of motifs is small. For 96% of the PROSITE motifs, the expectations of occurrences of the motifs in a 7-million-amino-acids r...

متن کامل

Typology of Natural Motifs of the Kurmanji Kilim

After a survey on one hundred and thirty samples of Kurmanji Kilims and extraction of their natural motifs, this study is trying to propose a typology of this motifs and characterize their methods of composition. The data reviewed in this article, have been gathered in a field and documentary research. The patterns of the Kurmanji Kilim can be categorized into four groups: 1. Natural motifs, 2....

متن کامل

Visual investigation of animal motifs in Tall-i Bakun’s earthenware

Pre-historic pottery investigating is one of the sources that provide a lot of information about the life and art of that time. Among Iranian arts, Pottery is of a greater antiquity, originality and importance and represents people’s culture, beliefs and traditions more than any other art. After basket weaving, pottery has been the first lasting art of mankind. By investigating earthenware moti...

متن کامل

بررسی فرم و نقش گلدان در مسجد - مدرسه‌های قاجار در قیاس با مدارس صفوی (با تأکید بر نقوش مسجد - مدرسه سپهسالار جدید)

Flower pot is one of the frequently used motifs in Qajar period. This decoration is abundantly found in carvings and tiling decoration of the mosque-schools in Qajar period. There is also the flower pot in previous eras schools such as Safavid period schools , but the flower pots used in Safavid period schools differ from the flower pots used in the Qajar period mosque- schools. However, this s...

متن کامل

A Study of the Motifs and Techniques of Needlework in Eight Prayer Rugs from the Qajar Period

As a sign of respect for God, Muslims' prostration during their prayer inspired the artists of Qajar period to represent the worshiper’s spiritual connection with the Divine through needlework on the textiles that were used for producing prayer mats. Needlework is a traditional art which, like any other traditional art, is characterized by its contribution to human transcendence, being rooted i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006